Skip to main content link. Accesskey S
  • Help
  • HCL Logo
  • HCL Sametime wiki
  • THIS WIKI IS READ-ONLY. Individual names altered for privacy purposes.
  • HCL forums and blogs
  • Home
  • Product Documentation
  • Community Articles
  • Learning Center
Search
Community Articles > Troubleshooting > Troubleshooting Sametime Advanced > Troubleshooting an IBM Sametime/Lotus Domino server crash
  • Share Show Menu▼
  • Subscribe Show Menu▼

Recent articles by this author

Troubleshooting an IBM Sametime/Lotus Domino server crash

The IBM Sametime Community Server is the last single component that still uses the IBM Lotus Domino Server, and we experience crashes in this area that require a good skill set to be able to diagnose why IBM Sametime causes Lotus Domino to crash. This article explains the basics of troubleshooting ...
Community articleTroubleshooting an IBM Sametime/Lotus Domino server crash
Added by ~Mary Desnugenings | Edited by ~Rebecca Bubveluzen on November 16, 2012 | Version 4
  • Actions Show Menu▼
expanded Abstract
collapsed Abstract
The IBM Sametime Community Server is the last single component that still uses the IBM Lotus Domino Server, and we experience crashes in this area that require a good skill set to be able to diagnose why IBM Sametime causes Lotus Domino to crash. This article explains the basics of troubleshooting IBM Sametime Community Server crashes.
Tags: Community Server
ShowTable of Contents
HideTable of Contents
  • 1 Introduction
  • 2 Understanding the Sametime components
  • 3 Main databases used by IBM Sametime
    • 3.1 Main processes of IBM Sametime
  • 4 Debugging tools & parameters
    • 4.1 Sametime Diagnostics Collector (stdiagzip)
    • 4.2 Collecting a manual NSD
  • 5 Sametime-specific diagnostics
    • 5.1 Controlling Sametime logging
    • 5.2 Sametime debugging on Windows
    • 5.3 Sametime debugging on Linux/Solaris/UNIX
    • 5.4 Sametime debugging on iSeries
    • 5.5 Enable full debugging on a single Sametime component
    • 5.6 Sametime Runtime Debug tool
  • 6 Reviewing the NSD and Sametime logging together
  • 7 Conclusion
  • 8 Tell us what you think
  • 9 Resources
  • 10 About the author

Introduction


If you have worked with a standard IBM® Lotus® Domino® crash when NSD is triggered, this is same way in which IBM Sametime® reacts as well. When Lotus Domino detects that IBM Sametime is not reacting or responding correctly, or a process has done something Lotus Domino does not like (terminates or crashes) due to memory limitations or database corruption, then nsd.exe will be triggered and the stack data collected.

This Notes System Diagnostics (NSD) file is created in the IBM_TECHNICAL_SUPPORT directory, which is located under the Domino Data directory.

The Domino server will generally crash on a specific Sametime task, the same as a normal Domino server crash, and then the investigation would go in that direction. Note, however, that it's still a good idea to ensure that this is the root cause.

Understanding the Sametime components


Here we discuss the main databases, applications, and services that are native to IBM Sametime. From a crash perspective, it is good to understand how these function and how they could cause the Sametime Server to crash.

Main databases used by IBM Sametime


Stconfig.nsf. All the configuration information for the Sametime Server is set here for Sametime versions 7.5.x, 8.0.x, and 8.5.x.Note that, if 8.5.x is deployed using a Deployment plan, then the SSC is used for this configuration, but changes are pushed to the stconfig.nsf.
  • Known issue with Stconfig.nsf: Documents inside the stconfig.nsf can get corrupted, so it's recommended to have a good backup of this database.
Stconf.nsf. Holds and stores all Sametime Classic Meeting information, including scheduled or completed meetings that have taken place or that will take place.
  • Known Issues with Stconf.nsf: If this database becomes too large, it can cause the Sametime Classic Meetings component to hang or for the Web browser to hang, and meetings might not launch.
  • Best practice here is to use the purge agent that deletes old meeting information. This can be configured to delete from a specific time frame, for example, delete meetings that are 40 days old, to maintain the database at a managable size.
    For more details, refer to “Maintaining the Sametime Meeting Center” (this documentation is still valid for releases of Sametime later than 7.5).
Vpuserinfo.nsf. Stores the entire Contacts list of each user on the Sametime Community Server.
• Known Issues with Vpuserinfo.nsf: The documents or views in this database can become corrupted and have been known to cause the Sametime Server to crash. A side effect of the vpuserinfo.nsf being corrupted is that you could have issues with public/private groups in users' Contacts lists.
Refer to IBM Support Technote #1092722, “Sametime Connect users have issues with contact lists,” for details on the correct maintenance switches you should use for fixup, updall, and compact.
Stlog.nsf. This is the same as the log.nsf but shows more detailed, Sametime-specific information.
• Known issues with Stlog.nsf: You need to maintain the size of this database, using the purge agent that you can configure per Technote #1279834, “Sametime Log database (stlog.nsf) can cause server crashes if it grows too large.”
If this database becomes corrupted or needs to be replaced, you cannot simply delete it and depend on the Domino server to recreate it upon restart. You must manually recreate this database with the stlog.ntf template. More details on how to do this are in Technote #1097511, “Does the Sametime STLog.nsf get recreated if it is deleted?”

Main processes of IBM Sametime


Table 1 provides a high-level description of the functionality of each of the individual processes or applications that run within IBM Sametime. If you know this information and have a crash on one of these services, it could help you find a reproducible scenario or understand from where the crash is originating.

Table 1. Sametime processes and their functionalities



Debugging tools & parameters


In this section we discuss the different methods and levels of debugging that can be activated on the Sametime Community Server. Before proceeding, however, note these folder and file locations:
  • Sametime Trace Folder. By default on all platforms this folder is located in the Domino Directory. It's recommended to keep this directory cleared out as much as possible; if you let it grow too large (2GB or greater), then it can cause the server to crash.
  • IBM_TECHNICAL_SUPPORT. By default on all platforms this folder is located under the Domino Data directory.

Sametime Diagnostics Collector (stdiagzip)


The stdiagzip is a script designed to collect and gather Sametime debugging infomation and add this data to a .zip folder that, by default, is created in the Trace Folder. The steps below show how to launch this from the command line on Microsoft® Windows®, but the steps are the same for Linux®.

To run the stdiagzip.bat on Windows, click Start --- Run, and type “cmd” to open a Command prompt (see figure 1).

Figure 1. Command prompt


The stdiagzip.bat is located under the Domino program directory [C:\Program Files (x86)\ibm\Lotus\Domino\stdiagzip.bat], as shown in figure 2.

Figure 2. stdiagzip.bat location



You can then see the files being collected and added to the .zip file that will be created by default under the Trace Folder.

To change any of the preferences of the stdiagzip.bat file you must edit the stdiagzip.properties file, which is also located under the Domino directory. Here you can remove or add what gets collected in the stdiagzip.

Note that, the bigger the trace folder, the longer it will take for this script to complete. If you feel that it is taking too long, check the main databases that are being collected. These two are added by default:
  • file_10=C:/Program Files (x86)/IBM/Lotus/Domino/data/stConfig.nsf
  • file_11=C:/Program Files (x86)/IBM/Lotus/Domino/data/stlog.nsf
In most cases the stlog.nsf can be quite large and you can either comment out the line above in the stdiagzip.propertie file or use the steps mentioned above for recreating the stlog.nsf.

Here are the steps on how to run these commands on other operating systems:

To run the commands on AIX®/Linux/Solaris, use:
/local/notesdata> sh stdiagzip.sh
To run the commands on IBM i, call:
QSAMETIME/STDIAGZIP servername
For each of these OS's the .zip file generated by the stdiagzip program is created in the data_dir/trace directory. Note that this not only includes all the Sametime Diagnostics but also copies the contents of the IBM_TECHNICAL_SUPPORT directory, so any manual or server-generated NSD files are also captured in this .zip file.

NOTE: The importance of collecting the stdiagzip for crash issues and other Sametime-related problems cannot be overstated; it tells us a lot more than you might expect, such as:
  • Domino version (including fixpacks)
  • OS version (including fixpacks and bit build)
  • Sametime Server version (including fixpacks)
  • Network information
  • Sametime and Domino cluster information
  • Start and Stop times of the Sametime Server

Collecting a manual NSD


In some cases, Sametime might crash but not terminate in such a way that an NSD is triggered. This could occur when a non-critical service crashes or terminates abnormally. Sometimes we can see this in the Domino server console.log or the log.nsf, for example:
  • STPolicy.exe has terminated abnormally
  • STUsers.exe has terminated abnormally
If the NSD is not automatcially collected, then you can run NSD manually to get more data, following these steps (for Windows):
  1. Click Start --- Run and type “cmd” to open a Command prompt.
  2. Navigate to where your Domino Server that is running Sametime is installed. By default, when running on 64-bit Windows, it is located under C:\Program Files (x86)\IBM\Lotus, as shown in figure 3 (for 32-bit, it will be under C:\Program Files\IBM\Lotus).
Figure 3. Domino install location



3. Once you have located the Lotus Directory, you need to “cd” to the Domino Directory where the nsd.exe is located (see figure 4)

Figure 4. “cd” to Domino Directory


4. Now you simply run the following nsd command as shown in figure 5.

Figure 5. nsd command



Once this command is issued, you can see it start collecting data from the server. Note that this can take some time, depending on the current load on the Domino server.

If you are using any OS version earlier than Windows 2003 SP1, you must use the -detach switch, nsd -detach, to create the NSD and then safely detach from the Sametime and Domino processes.

You can also use nsd -monitor, which puts NSD in a monitor mode. If you know the crash occurs during a specific timeframe during the day or late at night, it will detect if the server is panicing or in a fatal state and will collect the NSD file. Once this is issued you will have a nsd> command prompt and can issue the switches shown in figure 6.

Figure 6. nsd> switches


To see a list of NSD options and commands, open "nsddoc.html" in the Data/Help directory.

Sametime-specific diagnostics


The debugging steps in this section are the methods that Support uses when dealing with Sametime crash customer PMRs. If the crash is reproducible and occurs often enough, or we know the steps that can trigger the crash, then these are the best ways to get the correct data to Support so that a Sametime crash PMR can be resolved expeditiously.

This section includes the debugging that administrators should enable on their Sametime Server when they experience a Sametime Server crash. On a Domino level it is good to enable Console logging and Debug Thread ID logging by adding:
  • console_log_enabled=1
  • debug_theadID=1
to the Notes.ini (this requires a restart of the Domino instance on which Sametime is running). The Notes.ini is located under the Domino Directory of the Sametime Server.

Or, you can use the set config command in the Domino Server console, for example, as shown in figure 7. In this case it had already been added to the Notes.ini; hence the reason for the “...already ENABLED” message.

Figure 7. set config command

Controlling Sametime logging


We know that the logs can grow quite quickly if you have increased the logging levels, filling up valuable disk space, so there are three Sametime.ini parameters that can be added under the [debug] section of the Sametime.ini, to control its size. The available flags are:
  • ST_TRACEFILE_SIZE. Sets the max file size of each Trace file (in MB).
  • ST_TRACEFILE_CNT. Sets the number of trace files generated per Sametime service application.
  • ST_TRACEFILE_SIZE * ST_TRACEFILE_CNT. This is the Max size of the trace files on the OS HD per Sametime service application.
For example, change the setting in the Sametime.ini file under the [Debug] section as follows:
ST_TRACEFILE_SIZE=20
ST_TRACEFILE_CNT=50
So, 20x50 = 1000 MB, which is the maximum disk space each Sametime service application consumed for the trace files per the above settings.

Sametime debugging on Windows


To do this:
  1. Stop the Domino/Sametime server.
  2. In the Sametime.ini under [Debug], set VP_TRACE_ALL=1 (for 7.5.x and 8.0.x).
  3. Start Open regedit HKEY_LOCAL_MACHINE - SOFTWARE- LOTUS - SAMETIME - MEETINGSERVER – DIAGNOSTICS, go to LogPrintLevel=8, and change to LogPrintLevel=100.
  4. Go to the IBM_TECH_SUPPORT directory, clear the contents, and go to the Trace directory, and clear the contents,
  5. Restart Domino/Sametime, reproduce the issue, and run stdiagzip.bat.

Sametime debugging on Linux/Solaris/UNIX


To do this:
  1. Repeat steps 1 and 2 from the above section for Windows.
  2. In the Meetingserver.ini, search for LogPrintLevel=8, and change it to LogPrintLevel=16.
  3. Run the command "ststart resetlogs”.
  4. Restart Domino/Sametime and, once the server is up, and run stdiagzip.sh.

Sametime debugging on iSeries


To do this:

1. Shut down the Sametime/Domino Server, go to the trace directory, and delete the contents of this folder, using the following two commands:
RMVLNK '/<data dir>/trace/*.txt'
RMVLNK '/<data dir>/trace/*.diag'
2. Go to the IBM_Technical_Support Directory and delete the contents of this folder also. NOTE: DO NOT delete the directory, only the contents.

3. Locate the Sametime.ini and add the following parameter under the [Debug] section:
VP_TRACE_ALL=1 (7.5.x and 8.0.x)
4. Once all this debugging is added, launch the Sametime/Domino Server. You will then need to run “call QSAMETIME/STDIAGZIP servername”.

Enable full debugging on a single Sametime component


We have seen that adding VP_TRACE_ALL=1 collects all Sametime debug from all the services/applications; however, this is not very efficient and has been known to cause performance problems with newer versions of Sametime.

As of version 8.5, IBM Sametime includes a feature out of the box with which we can determine which service/application is causing the Sametime Server to crash.

For example, if we are interested in only those trace flags for the Stcommunity.exe application, we can use this tool to enable only STCommunity full traces. We can add a new debug section for this purpose in the Sametime.ini, so that only traces of STCommunity.exe will be generated:
[Debug-STCommunity]
VP_TRACE_ALL=1
VP_SNIFF=0
UCM_SNIFF=0
VP_DELAY_SNIFF=0
UCM_DELAY_SNIFF=0
You can add this functionality to a Sametime 8.0.2 server, but you must install Hotfix # ICAE-7TKRVV, which you can obtain by creating a PMR with IBM Sametme Support and referencing the hotfix number.

Sametime Runtime Debug tool


The Sametime Runtime Debug Tool (StdebugTool.exe) is an internal tool provided by IBM that you lets you set server trace flags for particular components, whenever you need to, without restarting the server.

For more information, refer to Sametime Info Center topics, “Lotus Sametime Runtime Debug Tool” and “Running the Stdebugtool.exe utility.”

Reviewing the NSD and Sametime logging together


Now that you have captured all this vital information, how do you read it? Simply open a PMR and request that Support review it? You could; however, first use the tips in this section to provide a more precise and techincal update to Support.

First, open the resulting NSD file in a text editor, for example, Notepad++ is easy to use and quite good for opening large files. You can use Notepad, but the search functionality in Notepad++ or UltraEdit is better.

The NSD file will be generated in the IBM_TECHNICAL_SUPPORT Folder, and we need to find the most recent file, so we should sort by last modified, to display the most recent file. The name structure of the file is like so:
nsd_platform_servername_date@time
nsd_W32I_sametime852ifr1_2012_11_12@15_59_55
Once located, we can open the file and review the details within. What we see at first glance is the Host Name, the OS version, and the Domino version (see figure 8).

Figure 8. NSD file contents



When dealing with NSD files we can simply check for keywords like “fatal” or “panic”, and we should find a stack that looks something like that shown in figure 9 (on Windows).

Figure 9. Crash stack



This is a good starting point for our investigation, and we can drill down a little more. If you see a value beside stpolicy (in the above example, “stpolicy: 1078: 0e78”), this is the thread and process ID.

If you search for this in the remainder of the NSD you will find a section called: .Mapped To: PThread [stpolicy: 1078: 0e78] , where should see a list of databases that this process was accessing at the time of the crash.

This could be one of the databases we covered above, and we could be dealing with corruption or some other issue, but it's a good head start on your investigation into what caused this server to crash.

Sometimes there is not a fatal or panic thread, and in that case we can check the sequence of events that led up to the crash. If we check the process listings we can see the order in which processes were launched, and if we find the process nsd.exe, then that's generally the last process that was running before nsd was “automatically lauched.” For example:

Check the <@@ ------ System Data -> Processes (Time HH:MM:SS) ------ @@> section of the NSD:
11c0 1078 0 09/28 02:01:38 ["C:\Lotus\Domino\nsd.exe" -dumpandkill -termstatus 5 -panicdirect -crashpid 4216 -crashtid 3704 -runtime 300: 11c0]
-> 26a4 0214 0 09/27 15:56:29 [C:\Lotus\Domino\StUsers.exe: 26a4]
Even though we do not have a full NSD, with this information we can check the remainder of the Sametime logs that were collected, so we can still determine what process caused the crash.

When you do this for manual NSDs, you will see that cmd ran before the nsd.exe was launched before NSD. For example:
1ce0 668 500 11/12 15:43:34 [ cmd: 1ce0]
290 1ce0 500 11/12 15:59:55 [C:\Program Files (x86)\IBM\Lotus\Domino\nsd: nsd: 0290]
Once we have the time of the crash and the possible Sametime task that caused it, we can corrolate this data with the Sametime debugging that we have under the trace folder. This directory is located under the Domino Data directory.

The Sametime.log is the main log that we review here to see all the Sametime services intializing or terminating (see figure 10). The “I” in the first column stands for Informational, whereas a “W” is for Warning. The next columns list the process/application name, then the date and time, and the action the process/application is performing.

Figure 10. Example Sametime.log


This log shows us the entire startup of the Sametime services/applications and reports whether any of the services failed. When Support reviews these type of crashes, they use the crash time of when the NSD was automatically generated, and move backward, so that they can see the sequence of events prior to the Sametime Server crashing.

You can also use the Domino Console log or the log.nsf for this purpose, which shows more information on a Domino level, while the Sametime.log provides a more detailed description of the native Sametime processes.

Conclusion


Hopefully this article has provided any administrator of an IBM Sametime Community Server the basic knowledege and understanding to troubleshoot a Sametime crash on Domino. You may in some cases not be able to solve the issue yourself, but you will be able to provide the appropriate Sametime diagnostic data and environment information to IBM through the PMR process, thus speeding up the resolution of your issue.

Tell us what you think


Please visit this link to take a one-question survey about this article:
http://www.surveymonkey.com/s/9Q6ZKGN

Resources


developerWorks® IBM Sametime product page:
http://www.ibm.com/developerworks/lotus/products/instantmessaging/

developerWorks IBM Lotus Notes and Domino product page:
http://www.ibm.com/developerworks/lotus/products/notesdomino/

About the author


Cormac O'Leary is a Senior Sametime Software Engineer based at IBM's Mulhuddart, Ireland, Lab, who has worked for Lotus Customer Support for the past six years. The past four years he has been a member of the Sametime Level 2 Team, before which he was on the Domino Crash and Performance Team. He was awarded the role of Support's Main Subject Matter Expert (SME) for Sametime Crash. You can reach him at oleacorm@ie.ibm.com.


  • Actions Show Menu▼


expanded Attachments (0)
collapsed Attachments (0)
Edit the article to add or modify attachments.
expanded Versions (4)
collapsed Versions (4)
Version Comparison     
VersionDateChanged by              Summary of changes
This version (4)Nov 16, 2012, 11:52:21 PM~Rebecca Bubveluzen  
2Nov 16, 2012, 11:44:23 PM~Phil Xanveluvitchoopsi  
1Nov 16, 2012, 11:29:58 PM~Phil Xanveluvitchoopsi  
1Nov 16, 2012, 9:57:54 PM~Arnold Minnutheretsi  
expanded Comments (0)
collapsed Comments (0)
Copy and paste this wiki markup to link to this article from another article in this wiki.
Go ElsewhereStay ConnectedAbout
  • HCL Software
  • HCL Digital Solutions community
  • HCL Software Support
  • BlogsDigital Solutions blog
  • Community LinkHCL Software forums and blogs
  • About HCL Software
  • Privacy
  • Accessibility